⚗️ Collect WebSocket resource events#4718
Conversation
|
Warning This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
Bundles Sizes Evolution
|
f9f774b to
f67e33e
Compare
ec073a8 to
98e75c8
Compare
f67e33e to
adaf9f3
Compare
98e75c8 to
1afcc3a
Compare
adaf9f3 to
cb90533
Compare
1afcc3a to
9b239da
Compare
cb90533 to
7fc9595
Compare
7fc9595 to
770842b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 770842b8f7
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| type: RumEventType.RESOURCE, | ||
| resource: { | ||
| id: generateUUID(), | ||
| type: ResourceType.WEBSOCKET, |
There was a problem hiding this comment.
Add schema support before emitting websocket resources
The generated RUM event schema still only allows the existing resource types (see packages/browser-rum-core/src/rumEvent.types.ts:665-676), and this PR also skips websocket resources in format validation (packages/browser-rum-core/test/formatValidation.ts:25-27). When track_web_sockets is enabled, events assembled here have resource.type: "websocket", so they do not match the SDK/intake schema and can be rejected or hidden by downstream validation. Please land/regenerate schema support before emitting this resource type.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
This is a prototype and the data collected is subject to change, downstream validation has been handled.
| 'send', | ||
| ({ target: instance, parameters: [data], onPostCall }) => { | ||
| const size = computePayloadSize(data) | ||
| const bufferedAmountPreSend = instance.bufferedAmount |
There was a problem hiding this comment.
Sample bufferedAmount after send
For a single large send, bufferedAmount is often 0 before the native send() call and only increases once that call queues the payload, so sampling only bufferedAmountPreSend makes buffered_amount_max miss the last/only write and under-report the real queue peak. Capture the value in the post-call path (or include both before and after) before updating the max.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Sampling bufferedAmountPreSend allows us to spot back pressure issues, if we instrument the post-call path, we'll just see the bufferedAmount grow by the size of the message that has just been sent.
|
|
||
| if (isExperimentalFeatureEnabled(ExperimentalFeature.TRACK_WEB_SOCKETS)) { | ||
| const webSocketCollection = startWebSocketCollection(lifeCycle, viewHistory, vitalCollection.addDurationVital) | ||
| cleanupTasks.push(webSocketCollection.stop) |
There was a problem hiding this comment.
Flush open websockets before stopping resources
When RUM is stopped while a socket is still open, cleanup tasks run in registration order: resourceCollection.stop() is registered earlier and stops its task queue before this later webSocketCollection.stop() emits WEBSOCKET_COMPLETED for open connections. That means the intended session_end resource is queued after the resource pipeline is already stopped and can be lost; flush websocket connections before stopping the resource collection/batch, or avoid emitting them during stop.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
In practice, the stop() cleanup isn't ran.
Also, the only way resourceCollection.stop() prevents a websocket event from being collected is when there's a corresponding task in taskQueue which is an unlikely window.
770842b to
4251e33
Compare
|
/to-staging |
|
View all feedbacks in Devflow UI.
Commit 4251e33ae1 will soon be integrated into staging-25.
Commit 4251e33ae1 has been merged into staging-25 in merge commit 07d8c51b44. If you need to revert this integration, you can use the following command: |
Integrated commit sha: 4251e33 Co-authored-by: bdibon <boris.dibon@datadoghq.com>
| state: 'connecting', | ||
| instance, | ||
| url, | ||
| ...(protocols !== undefined ? { protocols } : {}), |
There was a problem hiding this comment.
💬 suggestion: or is it needed?
| ...(protocols !== undefined ? { protocols } : {}), | |
| protocols, |
Same for similar patterns.
There was a problem hiding this comment.
I will remove the usage of this pattern, the idea was to not pass objects with [key, undefined] entries, but:
- this is not dramatic as it gets stripped out by the
sanitize()function in the end, - and it's never used in the codebase.
There was a problem hiding this comment.
However, for this specific line, I will make a shallowCopy of the array before passing it around as a safety measure so we don't risk mutating it.
|
|
||
| function computePayloadSize(data: unknown): number { | ||
| if (typeof data === 'string') { | ||
| return new TextEncoder().encode(data).byteLength |
There was a problem hiding this comment.
💬 suggestion: what about using computeBytesCount utility?
| if (data instanceof ArrayBuffer) { | ||
| return data.byteLength | ||
| } | ||
| if (ArrayBuffer.isView(data)) { | ||
| return data.byteLength | ||
| } |
There was a problem hiding this comment.
🥜 nitpick: what about grouping similar returns?
| if (data instanceof ArrayBuffer) { | |
| return data.byteLength | |
| } | |
| if (ArrayBuffer.isView(data)) { | |
| return data.byteLength | |
| } | |
| if (data instanceof ArrayBuffer || ArrayBuffer.isView(data)) { | |
| return data.byteLength | |
| } |
There was a problem hiding this comment.
💬 suggestion: what about moving this file and its tests to the resource directory?
There was a problem hiding this comment.
Should we move packages/browser-rum-core/src/domain/requestCollection.ts as well?
| IMAGE: 'image', | ||
| FONT: 'font', | ||
| MEDIA: 'media', | ||
| WEBSOCKET: 'websocket', |
| startClocks: event.startClocks, | ||
| duration, | ||
| rawRumEvent, | ||
| domainContext: {}, |
There was a problem hiding this comment.
💭 thought: could it be worth providing the web socket instance in the domain context?
if yes, it could be added later down the road
There was a problem hiding this comment.
I don't see any value right now. What do you have in mind?
| last_message_in_at?: TimeStamp | ||
| longest_inbound_silence: ServerDuration | ||
| inbound_idle_duration_before_close?: ServerDuration | ||
| buffered_amount_max?: number |
There was a problem hiding this comment.
💬 suggestion: it seems to always be defined
| buffered_amount_max?: number | |
| buffered_amount_max: number |
| WebSocketOpenContext, | ||
| WebSocketMessageInContext, | ||
| WebSocketMessageOutContext, | ||
| WebSocketClosedContext as WebSocketCloseContext, |
There was a problem hiding this comment.
💬 suggestion: we should keep the same naming
| const sessionExpiredSubscription = lifeCycle.subscribe(LifeCycleEventType.SESSION_EXPIRED, () => { | ||
| tracker.flushOpenConnections('session_end') | ||
| }) |
There was a problem hiding this comment.
❓ question: what do we want to do for websocket connection that spans across multiple sessions?
There was a problem hiding this comment.
For the prototype we'll just cut the tracking with the reason "session_end".
| stop: () => void | ||
| } | ||
|
|
||
| export function startWebSocketCollection( |
There was a problem hiding this comment.
💬 suggestion: IMO, we should have some e2e tests to:
- exercise a real instrumentation
- assert the expected events are emitted: vital + resource
- ensure it reacts as intended to sdk lifecycle (start/end view, session expiration/reewal)
a67b7fb to
9e9b3ca
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9e9b3cad80
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| context: sanitize({ | ||
| url: context.url, | ||
| protocols: context.protocols, | ||
| startViewId, |
There was a problem hiding this comment.
Respect trackResources for websocket vitals
When trackResources: false is combined with enableExperimentalFeatures: ['track_web_sockets'], this vital is still emitted for every connection and its context includes the raw WebSocket URL/protocols. The resource event is marked discarded elsewhere, but this separate vital is not, so customers who disabled resource collection still send WebSocket endpoint data unless they add custom beforeSend redaction.
Useful? React with 👍 / 👎.
| const sessionExpiredSubscription = lifeCycle.subscribe(LifeCycleEventType.SESSION_EXPIRED, () => { | ||
| tracker.flushOpenConnections('session_end') | ||
| }) |
There was a problem hiding this comment.
Flush open sockets before page-exit flush
When a page is closed or navigated while a long-lived WebSocket is still open, browsers do not guarantee that a close event runs before the SDK's unload flush. This collection only finalizes open sockets on session expiration or SDK stop, while the existing page-exit path emits PAGE_MAY_EXIT immediately before flushing the batch, so the common open-on-unload case produces no websocket resource to include in that final flush.
Useful? React with 👍 / 👎.
| globalObject, | ||
| 'WebSocket', | ||
| ({ parameters, onPostCall }) => { | ||
| const url = String(parameters[0]) |
There was a problem hiding this comment.
Normalize WebSocket URLs after construction
When an app uses a relative WebSocket URL such as new WebSocket('/ws'), or an http(s) URL in browsers that translate it to ws(s), this captures the pre-construction argument instead of the constructed socket's resolved instance.url. The resulting RUM resource URL can be /ws or https://... rather than the actual ws(s)://host/..., making URL grouping, filtering, and redaction inconsistent with other resource events.
Useful? React with 👍 / 👎.
| AFTER_VIEW_ENDED, | ||
| REQUEST_STARTED, | ||
| REQUEST_COMPLETED, | ||
| WEBSOCKET_COMPLETED, |
There was a problem hiding this comment.
Append lifecycle events to preserve const enum values
LifeCycleEventType is exported from @datadog/browser-rum-core, and as a const enum its numeric values are inlined by downstream packages or integrations compiled against the declarations. Inserting WEBSOCKET_COMPLETED before SESSION_EXPIRED shifts every following value, so mixed-version code can subscribe to or notify the wrong lifecycle event; append the new member instead.
Useful? React with 👍 / 👎.
| IMAGE: 'image', | ||
| FONT: 'font', | ||
| MEDIA: 'media', | ||
| WEBSOCKET: 'websocket', |
There was a problem hiding this comment.
Avoid exposing websocket as a manual resource type
Because ResourceOptions.type uses this ResourceType union, adding WEBSOCKET here also lets public startResource/stopResource calls emit resource.type: 'websocket' without any resource.websocket metrics. Those manual events then look like websocket connection resources but are missing the fields this change's consumers expect, so the new type should stay internal to automatic websocket collection or be rejected by the manual-resource path.
Useful? React with 👍 / 👎.
Run `yarn start` (nuxt preview) for both Nuxt test apps in Playwright so local runs match CI and avoid vite-node IPC flakes under parallel workers. Bind preview to fixed ports to avoid clashes with common 3000/3001 listeners. Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
9e9b3ca to
6ffe09f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 6ffe09f7cb
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const tracker = trackWebSocket(lifeCycle, initWebSocketObservable(), viewHistory, addDurationVital) | ||
|
|
||
| const sessionExpiredSubscription = lifeCycle.subscribe(LifeCycleEventType.SESSION_EXPIRED, () => { | ||
| tracker.flushOpenConnections('session_end') |
There was a problem hiding this comment.
Keep tracking open sockets after renewal
When a session expires while a WebSocket remains open, this call emits a session_end resource and flushOpenConnections clears the registry. The browser keeps using the same WebSocket instance, so after the session is renewed later message-* and close contexts have no new connecting event to re-register the socket and are ignored, causing long-lived sockets to stop producing websocket resources for the rest of the page. Keep or re-seed open connections across the session boundary so the renewed session can still be measured.
Useful? React with 👍 / 👎.
| longest_inbound_silence: toServerDuration(event.longestInboundSilence), | ||
| inbound_idle_duration_before_close: toServerDuration(event.inboundIdleDurationBeforeClose), | ||
| buffered_amount_max: event.bufferedAmountMax, | ||
| protocol: event.protocol, |
There was a problem hiding this comment.
Allow redacting websocket subprotocols
When an application uses the WebSocket subprotocol for customer data such as an auth/session token, this sends the negotiated value in resource.websocket.protocol, but beforeSend cannot redact it because the resource whitelist only allows resource.websocket.close_reason among websocket fields. Add this field to the modifiable paths or omit it so customers are not forced to drop the whole event to remove sensitive subprotocol values.
Useful? React with 👍 / 👎.
| viewHistory: ViewHistory | ||
| ): WebSocketCompleteEvent { | ||
| const endClocks = endInfo.at | ||
| const endViewId = viewHistory.findView(endClocks.relative)?.id |
There was a problem hiding this comment.
Preserve end view for session-end sockets
For sockets finalized on session expiration, startViewCollection subscribes to SESSION_EXPIRED before this collection is started, so the active view is closed in viewHistory before flushOpenConnections builds the websocket event. This lookup uses a fresh end timestamp after that close and can return undefined, leaving resource.websocket.end_view_id absent for session_end resources even though the socket ended in the current view; flush before closing the view or look up the last active view for this path.
Useful? React with 👍 / 👎.


Motivation
WebSocket connections are a blind spot in RUM resource monitoring. This PR adds WebSocket tracking behind the
track_web_socketsexperimental feature flag, collecting per-connection metrics (message counts, sizes, timing, handshake duration, close codes) and reporting them asresourceevents withresource.type: "websocket".Changes
packages/browser-corewebSocketObservable: Instruments theWebSocketconstructor andsendmethod viainstrumentConstructor/instrumentMethodto emit lifecycle events:connecting,open,message-in,message-out,closed. Tracks payload sizes and clock timestamps for each event.ResourceType.WEBSOCKET = 'websocket'to the resource type enum.ExperimentalFeature.TRACK_WEB_SOCKETS = 'track_web_sockets'experimental flag.packages/browser-rum-corewebSocketCollection: ConsumeswebSocketObservableand maintains an in-memory registry of open connections. Aggregates metrics (message counts/sizes, first-message offsets, longest inbound silence, buffered amount peaks) and emitsWEBSOCKET_COMPLETEDlifecycle events on close or session expiry.resourceCollection: Subscribes toWEBSOCKET_COMPLETEDand assemblesRumResourceEventobjects withresource.type: "websocket"and aresource.websocketsub-object containing the full connection metrics.lifeCycle: Adds theWEBSOCKET_COMPLETEDevent type and its payload type.rawRumEvent.types: AddsWebSocketResourcePropertiesinterface and wires it intoRawRumResourceEvent.startRum: StartswebSocketCollectionwhen theTRACK_WEB_SOCKETSexperimental feature is enabled.scripts/dev-serverAccess-Control-Allow-Origin: *response header to support cross-origin WebSocket testing from the sandbox.Test instructions
Run unit tests:
Manual end-to-end test:
Open
http://localhost:8080and replacesandbox/index.htmlwith:Click the button, wait ~3 seconds, then open a new tab to flush events:
Expected output — a resource event with:
{ "resource": { "type": "websocket", "url": "ws://localhost:8765", "websocket": { "handshake_succeeded": true, "messages_in": { "count": 2, "size": 22 }, "messages_out": { "count": 2, "size": 22 }, "close_code": 1000, "was_clean": true, "tracking_end_reason": "close_event" } } }Checklist